Source | # of sentences | Average logarithmic rank |
---|---|---|
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/05/080531_ansar_burni_india.shtml | 13 | 4.17 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/07/080727_ahmadabad_new.shtml | 20 | 4.22 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/05/080525_karnataka_analysis.shtml | 11 | 4.28 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/03/080316_sarabjit_media.shtml | 16 | 4.29 |
http://newsforums.bbc.co.uk/ws/thread.jspa?threadID=7530 | 16 | 4.31 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/10/081006_sherry_jo_pp.shtml | 11 | 4.36 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/04/080403_mulayam_congress.shtml | 12 | 4.37 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/09/080903_flood_kosi_level.shtml | 11 | 4.37 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/03/080311_forced_marriage.shtml | 16 | 4.39 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/07/080703_oval_controversy.shtml | 11 | 4.40 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/05/080513_malya_prabhashji.shtml | 11 | 4.41 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2007/12/071220_saurav_intrw.shtml | 12 | 4.42 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/12/081201_patil_rtesigns_vv.shtml | 11 | 4.47 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/12/081209_congress_upbeat_ac.shtml | 11 | 4.47 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/04/080421_smriti_irani.shtml | 12 | 4.48 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/news/story/2008/03/080324_iraq_us_soldiers.shtml | 11 | 4.48 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/08/080804_jasdev_iv.shtml | 12 | 4.48 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/04/080427_judges_nawaz.shtml | 11 | 4.48 |
http://newsforums.bbc.co.uk/ws/thread.jspa?forumID=5166 | 15 | 4.48 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/04/080411_pradeep_column.shtml | 18 | 4.48 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/07/080720_pradeep_column.shtml | 13 | 4.49 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/11/081130_mumbai_analysis_awa.shtml | 12 | 4.51 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/10/081016_nandita_interview.shtml | 29 | 4.52 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/03/080322_tibet_vivechana.shtml | 16 | 4.53 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/10/081027_vivechana_rights_mk.shtml | 22 | 4.54 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/business/story/2008/10/081011_imf_warning_ak.shtml | 12 | 4.54 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/12/081224_ravikant_releation_arya.shtml | 14 | 4.54 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/11/081128_mumbai_timeline_pa.shtml | 12 | 4.56 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/07/080703_india_lankapreview.shtml | 12 | 4.56 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/04/080401_nregs_analysis.shtml | 11 | 4.56 |
Source | # of sentences | Average logarithmic rank |
---|---|---|
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/learningenglish/story/2008/06/080608_homophones.shtml | 11 | 6.50 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/11/081115_yog_anulomvilom.shtml | 16 | 5.76 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/08/080808_yoga_dhanurasan.shtml | 13 | 5.68 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/05/080517_yoga_singhgarjan.shtml | 12 | 5.68 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/news/story/2008/07/080702_askus_lightyear_days.shtml | 13 | 5.64 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/03/080318_dudhwa_rhino.shtml | 14 | 5.61 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/news/story/2007/09/070911_askus_keyboard.shtml | 13 | 5.60 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/03/080301_yoga1.shtml | 16 | 5.57 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/11/081101_yog_samkon_awa.shtml | 12 | 5.55 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/03/080328_pakistan_kathak.shtml | 11 | 5.53 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/news/story/2008/06/080613_askus_chameleon.shtml | 17 | 5.47 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/12/081229_bollywood_2008.shtml | 15 | 5.46 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/10/081018_yoga_sheetli.shtml | 15 | 5.45 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/03/080328_yoga_skandh.shtml | 21 | 5.44 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/news/story/2008/01/080102_askus_flag_gun.shtml | 13 | 5.40 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/04/080427_mumbai_deccan.shtml | 14 | 5.40 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2007/12/071231_anurag_kashyap.shtml | 11 | 5.37 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/01/080112_pankaj_kapoor.shtml | 16 | 5.36 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/02/080224_railbudget_highspeed.shtml | 13 | 5.36 |
http://newsforums.bbc.co.uk/ws/thread.jspa?forumID=4611 | 14 | 5.36 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/entertainment/story/2008/01/080126_nida_column.shtml | 12 | 5.34 |
http://newsforums.bbc.co.uk/ws/thread.jspa?threadID=4745 | 16 | 5.34 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/05/080527_chennai_deccan.shtml | 11 | 5.34 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/regionalnews/story/2008/10/081002_shastrijee.shtml | 14 | 5.34 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/09/080926_yoga_utthan_awa.shtml | 14 | 5.33 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/learningenglish/story/2007/06/070605_blend_words.shtml | 11 | 5.33 |
http://newsforums.bbc.co.uk/ws/thread.jspa?forumID=6324 | 11 | 5.33 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/07/080725_yoga_bhujangasan.shtml | 13 | 5.33 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/sport/story/2008/05/080504_mumbai_delhi.shtml | 12 | 5.32 |
http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/hindi/science/story/2008/09/080913_yoga_janu_pasch.shtml | 19 | 5.32 |
In this subsection we replace average word length by average logarithmic word rank. The logarithm of the word rank is taken because we want to punish words of high ranks only moderately.
First table:
select source, count(distinct i_s.s_id) as cnt_s, round(avg(log(w.w_id-100)),2) as av from sources so, inv_so i_s, inv_w i, words w where so.so_id=i_s.so_id and i_s.s_id=i.s_id and i.w_id=w.w_id and w.w_id>100 group by source having cnt_s>10 order by av LIMIT 30;
6.4.2.1 Average word length for different sources
6.4.2.3 Sources consisting of many / few words with frequency 1
6.4.2.4 Sources with low / high average word length of rare words